1,448 research outputs found

    MPCI : An R Package for Computing Multivariate Process Capability Indices

    Get PDF
    Manufacturing processes are often based on more than one quality characteristic. When these variables are correlated the process capability analysis should be performed using multivariate statistical methodologies. Although there is a growing interest in methods for evaluating the capability of multivariate processes, little attention has been given to developing user friendly software for supporting multivariate capability analysis. In this work we introduce the package MPCI for R, which allows to compute multivariateprocess capability indices. MPCI aims to provide a useful tool for dealing with multivariate capability assessment problems. We illustrate the use of MPCI package through both simulated and real examples

    Graph Neural Network-Based Anomaly Detection for River Network Systems

    Full text link
    Water is the lifeblood of river networks, and its quality plays a crucial role in sustaining both aquatic ecosystems and human societies. Real-time monitoring of water quality is increasingly reliant on in-situ sensor technology. Anomaly detection is crucial for identifying erroneous patterns in sensor data, but can be a challenging task due to the complexity and variability of the data, even under normal conditions. This paper presents a solution to the challenging task of anomaly detection for river network sensor data, which is essential for accurate and continuous monitoring. We use a graph neural network model, the recently proposed Graph Deviation Network (GDN), which employs graph attention-based forecasting to capture the complex spatio-temporal relationships between sensors. We propose an alternate anomaly scoring method, GDN+, based on the learned graph. To evaluate the model's efficacy, we introduce new benchmarking simulation experiments with highly-sophisticated dependency structures and subsequence anomalies of various types. We further examine the strengths and weaknesses of this baseline approach, GDN, in comparison to other benchmarking methods on complex real-world river network data. Findings suggest that GDN+ outperforms the baseline approach in high-dimensional data, while also providing improved interpretability. We also introduce software called gnnad

    Correcting misclassification errors in crowdsourced ecological data: A Bayesian perspective

    Full text link
    Many research domains use data elicited from "citizen scientists" when a direct measure of a process is expensive or infeasible. However, participants may report incorrect estimates or classifications due to their lack of skill. We demonstrate how Bayesian hierarchical models can be used to learn about latent variables of interest, while accounting for the participants' abilities. The model is described in the context of an ecological application that involves crowdsourced classifications of georeferenced coral-reef images from the Great Barrier Reef, Australia. The latent variable of interest is the proportion of coral cover, which is a common indicator of coral reef health. The participants' abilities are expressed in terms of sensitivity and specificity of a correctly classified set of points on the images. The model also incorporates a spatial component, which allows prediction of the latent variable in locations that have not been surveyed. We show that the model outperforms traditional weighted-regression approaches used to account for uncertainty in citizen science data. Our approach produces more accurate regression coefficients and provides a better characterization of the latent process of interest. This new method is implemented in the probabilistic programming language Stan and can be applied to a wide number of problems that rely on uncertain citizen science data.Comment: 18 figures, 5 table

    Bayesian spatio-temporal models for stream networks

    Full text link
    Spatio-temporal models are widely used in many research areas including ecology. The recent proliferation of the use of in-situ sensors in streams and rivers supports space-time water quality modelling and monitoring in near real-time. In this paper, we introduce a new family of dynamic spatio-temporal models, in which spatial dependence is established based on stream distance and temporal autocorrelation is incorporated using vector autoregression approaches. We propose several variations of these novel models using a Bayesian framework. Our results show that our proposed models perform well using spatio-temporal data collected from real stream networks, particularly in terms of out-of-sample RMSPE. This is illustrated considering a case study of water temperature data in the northwestern United States.Comment: 26 pages, 10 fig

    Increasing trust in new data sources: crowdsourcing image classification for ecology

    Full text link
    Crowdsourcing methods facilitate the production of scientific information by non-experts. This form of citizen science (CS) is becoming a key source of complementary data in many fields to inform data-driven decisions and study challenging problems. However, concerns about the validity of these data often constrain their utility. In this paper, we focus on the use of citizen science data in addressing complex challenges in environmental conservation. We consider this issue from three perspectives. First, we present a literature scan of papers that have employed Bayesian models with citizen science in ecology. Second, we compare several popular majority vote algorithms and introduce a Bayesian item response model that estimates and accounts for participants' abilities after adjusting for the difficulty of the images they have classified. The model also enables participants to be clustered into groups based on ability. Third, we apply the model in a case study involving the classification of corals from underwater images from the Great Barrier Reef, Australia. We show that the model achieved superior results in general and, for difficult tasks, a weighted consensus method that uses only groups of experts and experienced participants produced better performance measures. Moreover, we found that participants learn as they have more classification opportunities, which substantially increases their abilities over time. Overall, the paper demonstrates the feasibility of CS for answering complex and challenging ecological questions when these data are appropriately analysed. This serves as motivation for future work to increase the efficacy and trustworthiness of this emerging source of data.Comment: 25 pages, 10 figure

    clusterBMA: Bayesian model averaging for clustering

    Full text link
    Various methods have been developed to combine inference across multiple sets of results for unsupervised clustering, within the ensemble clustering literature. The approach of reporting results from one `best' model out of several candidate clustering models generally ignores the uncertainty that arises from model selection, and results in inferences that are sensitive to the particular model and parameters chosen. Bayesian model averaging (BMA) is a popular approach for combining results across multiple models that offers some attractive benefits in this setting, including probabilistic interpretation of the combined cluster structure and quantification of model-based uncertainty. In this work we introduce clusterBMA, a method that enables weighted model averaging across results from multiple unsupervised clustering algorithms. We use clustering internal validation criteria to develop an approximation of the posterior model probability, used for weighting the results from each model. From a consensus matrix representing a weighted average of the clustering solutions across models, we apply symmetric simplex matrix factorisation to calculate final probabilistic cluster allocations. In addition to outperforming other ensemble clustering methods on simulated data, clusterBMA offers unique features including probabilistic allocation to averaged clusters, combining allocation probabilities from 'hard' and 'soft' clustering algorithms, and measuring model-based uncertainty in averaged cluster allocation. This method is implemented in an accompanying R package of the same name

    Objective sequence-based subfamily classifications of mouse homeodomains reflect their in vitro DNA-binding preferences

    Get PDF
    Classifying proteins into subgroups with similar molecular function on the basis of sequence is an important step in deriving reliable functional annotations computationally. So far, however, available classification procedures have been evaluated against protein subgroups that are defined by experts using mainly qualitative descriptions of molecular function. Recently, in vitro DNA-binding preferences to all possible 8-nt DNA sequences have been measured for 178 mouse homeodomains using protein-binding microarrays, offering the unprecedented opportunity of evaluating the classification methods against quantitative measures of molecular function. To this end, we automatically derive homeodomain subtypes from the DNA-binding data and independently group the same domains using sequence information alone. We test five sequence-based methods, which use different sequence-similarity measures and algorithms to group sequences. Results show that methods that optimize the classification robustness reflect well the detailed functional specificity revealed by the experimental data. In some of these classifications, 73ā€“83% of the subfamilies exactly correspond to, or are completely contained in, the function-based subtypes. Our findings demonstrate that certain sequence-based classifications are capable of yielding very specific molecular function annotations. The availability of quantitative descriptions of molecular function, such as DNA-binding data, will be a key factor in exploiting this potential in the future.Canadian Institutes of Health Research (MOP#82940)Sickkids FoundationOntario Research FundNational Science Foundation (U.S.)National Human Genome Research Institute (U.S.) (R01 HG003985

    The multifaceted roles of perlecan in fibrosis

    Get PDF
    Perlecan, or heparan sulfate proteoglycan 2 (HSPG2), is a ubiquitous heparan sulfate proteoglycan that has major roles in tissue and organ development and wound healing by orchestrating the binding and signaling of mitogens and morphogens to cells in a temporal and dynamic fashion. In this review, its roles in fibrosis are reviewed by drawing upon evidence from tissue and organ systems that undergo fibrosis as a result of an uncontrolled response to either inflammation or traumatic cellular injury leading to an over production of a collagen-rich extracellular matrix. This review focuses on examples of fibrosis that occurs in lung, liver, kidney, skin, kidney, neural tissues and blood vessels and its link to the expression of perlecan in that particular organ system
    • ā€¦
    corecore